Social and place-focused communities in location-based online social networks 
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Thanks to widely available, cheap Internet access and the ubiquity of smartphones, millions of 
people around the world now use online location-based social networking services. Understanding 
the structural properties of these systems and their dependence upon users' habits and mobility 
has many potential applications, including resource recommendation and link prediction. Here, we 
construct and characterise social and place-focused graphs by using longitudinal information about 
declared social relationships and about users' visits to physical places collected from a popular 
online location-based social service. We show that although the social and place-focused graphs 
are constructed from the same data set, they have quite different structural properties. We find 
that the social and location-focused graphs have different global and meso-scale structure, and in 
particular that social and place-focused communities have negligible overlap. Consequently, group 
inference based on community detection performed on the social graph alone fails to isolate place- 
focused groups, even though these do exist in the network. By studying the evolution of tie structure 
within communities, we show that the time period over which location data are aggregated has a 
substantial impact on the stability of place-focused communities, and that information about place- 
based groups may be more useful for user-centric applications than that obtained from the analysis 
of social communities alone. 



I. INTRODUCTION 

Networks can describe a large variety of complex sys- 
tems, and network science has proved to be a success- 
ful framework for the quantitative study of their struc- 
ture and dynamics I n the last decade, the tools 
and models provided by complex network theory have 
enabled discovery of similarities between seemingly very 
different systems including the Internet, the human pro- 
tcomc, and collaboration networks. Complex networks 
analysis is now regularly employed to characterise the 
topology and functioning of biological, technological and 
social structures [J, Q . 

The analysis of social networks is one of the tradi- 
tional application fields of network science, and sociol- 
ogists generally agree that social behaviours, from opin- 
ion formation to rule enforcement, from individual suc- 
cess to cooperation, depend in a fundamental way on 
the structure and evolution of the patterns of social rela- 
tionships. In other words, characterising and quantifying 
social structures is often a prerequisite for understanding 
and interpreting social dynamics [6- 8] . In the last twenty 
years, sociologists have relied on the study of small so- 
cial networks with tens or hundreds of nodes at most, 
collected by means of targeted questionnaires and direct 
interviews. Recently, the ubiquity of the Internet and 
the World Wide Web, and the emergence of hundreds 
of online social services, have produced a huge volume 
of data about online relationships between millions of 
people around the world. These online social networks 
(OSNs) have allowed quantitative verification of socio- 
logical theories on an unprecedented scale. Analysis of 
a wide variety of online social systems has allowed in- 
sights into the dynamics of human behaviours, including 
bond formation, cooperation, imitation, and synchroni- 



sation [9l— ll II] - However, the extent of the correspondence 
between people's online activities and their offline lives 
is still the subject of debate (i~2l - fl4| . 

One problem of interest in social network analysis is 
that of identifying communities, cohesive groups of peo- 
ple who are more tightly interconnected to each other 
than to the rest of the network. Communities can be 
exploited in a wide range of practical applications, in- 
cluding obtaining coarse-grained visual representations 
of large networks, sorting personal online contacts into 
manageable groups, finding partitions to speed up the 
performance of services or providing personalised recom- 
mendations ■ Many methods have been proposed 
in the last decade to find the best partition of a graph into 
a set of meaningful communities [TMHH, and the com- 
munity structure of OSNs has recently been the subject 
of much research I22H24I . 



At the same time, OSNs are becoming increasingly 
location-aware, meaning that user-produced content has 
an associated place. Examples include Facebook's recent 
introduction of the ability to tag any post with a loca- 
tion [25|] , geo-tagged tweets in Twitter [26|, and explic- 
itly location-based social networks. The most pop ular of 
these, Foursquare, has almost 35 million users [2jJ. Much 
research has shown that online social ties are more likely 
to form between spatially close users than between those 
further apart fl3l. l28l - l3~l| . However, the exact role played 
by space in the formation and evolution of communities 
is still unclear. 

In this work we study Gowalla, an online location- 
based social network, and analyse friendship and co- 
location networks obtained from longitudinal data cor- 
responding to more than 3 months of activity of around 
150,000 users. We focus on the structural properties 
and evolution of social and local communities, defined 
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as tightly-connected groups of nodes in the social and in 
the co-location graphs respectively, and find that in gen- 
eral the overlap between social and local communities is 
small, if not negligible. A local community is rarely a 
proper subset of a social community, and usually con- 
tains members belonging to different social groups. Fur- 
thermore, the probability of two unconnected nodes be- 
coming connected is much higher if they belong to the 
same local community, and users in the same local com- 
munity who have not been in the same place are more 
likely later to visit a common place than users in the 
same social community. Finally, while the structure of 
the social communities is relatively stable over time, lo- 
cal communities are more dynamic and volatile. The 
differences between social and local communities high- 
lighted in this work provide a first piece of evidence that 
the standard approach to social group inference, based on 
the detection of communities in the social graph, can fail 
to capture the microscopic dynamics of local groups. Our 
results suggest that information derived from social com- 
munity analysis should be appropriately complemented 
with knowledge about individual activity before being 
used in user-centric applications such as providing friend 
suggestions or place recommendations. 



II. THE DATASET 

We analyse data from Gowalla, an online location- 
based social network founded in 2009 and discontinued 
at the end of 2011, when the company was bought by 
Facebook. The service allowed users to declare friendship 
ties to other users, thus forming a social network. The 
main user activity in Gowalla was the check-in: users in- 
dicating their presence at specific, named venues using 
a mobile phone application. When users checked in to 
places in this way, geo-located and time-stamped records 
were stored in the system, and their friends in the social 
network were notified of their location. 



A. Data collection 

Our dataset consists of a series of daily crawls of 
Gowalla downloaded between 4 th May and 19 th August 
2010, obtained using the public API provided by Gowalla 
to allow other applications and services to access their 
content. Each user is identified by an anonymised nu- 
meric ID and has an associated profile including social 
connections and check-ins. We downloaded these profiles 
from the service daily over the crawling period, meaning 
that for each day we have complete information about 
the social graph (all the friendship ties between users at 
that time), and about all the user check- ins. Each check- 
in consists of the venue name, category, location (latitude 
and longitude) , the ID of the user who made the check- 
in, and a timestamp. We also have a record of all the 
check-ins that had taken place before the measurement 



period began, but we do not have the state of the social 
network corresponding to this period. 

In Gowalla each place is represented as a named venue, 
such as 'Starbucks', 'Kings Cross Railway Station' or 
'Computer Laboratory', with latitude and longitude val- 
ues so that the correct 'Starbucks' for the user's location 
can be identified. The user therefore checks in to a spe- 
cific place, rather than being located using coordinates 
alone. We can therefore identify when users actually 
visit the same places rather than just being in geographic 
proximity, e.g., in two shops next door to one another. 
Having crawled Gowalla daily, we are able to examine 
closely which social ties were formed and deleted during 
the data collection period, and gain insight into the dy- 
namics of the network structure at the level of individual 
links. The crawl was performed when the network was 
already fairly large and steadily growing, not during the 
explosive growth period typically observed shortly after 
the creation of such online social services, when their 
popularity increases exponentially (23j . 



B. Data processing 

Since we have two kinds of information about users, 
i.e. the places where they have checked in, and their 
connections in the social network, given a time interval 
we can construct two different graphs. The first graph 
G = (V, E) represents the social network: each user 
present in the system during the considered time interval 
is represented by one of the N = \V\ nodes of the set V, 
and E is a set of K edges (or ties) between nodes. The 
edge {u\,U2) exists in E between users U\,U2 G V% if Ui 
and U2 are friends in the OSN in that time interval. We 
represent a graph by the adjacency matrix A = {Ay }, in 
which the entry Aij — 1 if there exists a link connecting 
node Ui and node Uj, and = otherwise. The number 
of neighbours of a node Ui is called the degree of Ui, and 
is denoted by fcj = y"V . In the following we refer to 
the average degree of a graph (k) = 2K/N. In Gowalla, 
ties are bidirectional and indistinguishable, so the social 
graphs we construct are undirected and unweighted, and 
the associated adjacency matrix is symmetric. 

Using the information about places where users check 
in, we can define the notion of placefriends: users (not 
necessarily having a social tie) who have checked in to 
one or more of the same places. Since our aim is to 
investigate the relationship between online and off-line 
social groups, we are particularly interested in users who 
have checked in to one or more of the same places as 
their online friends, that is, users who are both friends 
and placefriends. Given a time interval and the corre- 
sponding social graph G = (V,E), we define the asso- 
ciated placefriends- social graph G p = (V P ,E P ) as the 
subgraph of G such that V p contains all the nodes in 
V having at least one friend who is also a placefriend, 
and E p is the subset of edges (u±,U2) & E such that u\ 
and U2 are both friends and placefriends. We call N p 
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FIG. 1. (colour online) Structural properties of the social (black circles) and placefriends graphs (red squares) over time. 
The number of nodes N in both graphs increases exponentially with time (panel a, left), and the mean degree (k) increases 
only slightly (panel a, right). Panel b): the value of the mean shortest path length (L) divided by the expected value in a 
corresponding Erdos-Renyi graph (Ler) is smaller than those for the social and the placefriends graphs (left). Both networks 
have a relatively high clustering coefficient (right). Panel c): the degree distribution of the social graph (left, 7 ~ 2.52 in the 
social graph) is typically more heterogeneous than that of the placefriends graph (right, 7 ~ 2.84 in the placefriends graph). 
Panel d): due to the presence of many super-hubs, the average degree of first neighbours of a node having degree k in the social 
graph is an increasing function of k (left, disassortative degree distribution); conversely, the placefriends graph has assortative 
correlations (right). The results in panel c) and panel d) correspond to the whole observation interval. 



and K p , respectively, the number of nodes and the num- 
ber of edges in the placefriends graph. For convenience, 
we henceforth refer to the placcfriends-social graph sim- 
ply as the placefriends graph, and to the edges of the 
placefriends graph as placefriends edges (or ties). Each 
placefriends edge has a weight corresponding to the ac- 
tual number of places where the connected users have 
both checked in during the relevant time interval. Con- 
sequently, the placefriends graphs are undirected and 
weighted. A weighted graph can be represented by a 
weighted adjacency matrix W = {Wij}, where the entry 
Wij indicates the weight of the link between node itj and 
node Wj, if it exists, while Wij — if u 2 ; and Uj do not 
have a link. Given a node Uj in a weighted graph we 
define its strength ai as the sum of the weights incident 
on Ui, i.e. a l = £\ w ir 



III. STRUCTURE OF SOCIAL AND 
PLACEFRIENDS GRAPHS 



We analyse the structure of the social and placefriends 
graphs in Gowalla, focusing on the temporal evolution of 
communities. In order to study the temporal evolution 
of these graphs, we divided the original dataset into 8 
snapshots, each covering a period of 2 weeks except the 
last one, which is 9 days long. Tablc|T]reports the number 
of new check-ins and new unique places per day at each 
snapshot, and the total number of check-ins and unique 
places in the dataset at the end of each snapshot. The 
total number of check-ins in the first snapshot refers to 
all check-ins recorded in the system since the inception of 
Gowalla. At the time of crawling the system was steadily 
growing, with around 5,000 new places visited every day. 



4 





NPVi 

IN 


Hp 
IN 1 




TP 


1 






4,946,778 


1,023,991 


2 


49,562 
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59,055 
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59,846 


4,785 


7,305,267 


1,228,323 


5 


52,085 


4,876 


8,034,466 


1,296,594 


6 


53,878 


5,061 


8,788,764 


1,367,448 


7 


57,941 


4,921 


9,599,945 


1,436,352 


8 


32,106 


2,960 


9,888,905 


1,462,993 



TABLE I. The mean number of new check-ins per day (NCh), 
mean number of new unique places per day (NP), total num- 
ber of check-ins (TCh), and total number of places (TP) in the 
dataset at the end of each snapshot. The number of check- ins 
and the number of places grew steadily during the crawl. 



A. Basic network properties 

Fig. \H reports the basic structural properties of the so- 
cial and placefricnds graphs corresponding to each of the 
eight snapshots. The number of nodes in the largest con- 
nected components of both the social (black circles) and 
the placefricnds graphs (red squares) increases exponen- 
tially over time (Fig. QJ,), confirming that at the time of 
data acquisition the system was still growing. In particu- 
lar, the number of users in the largest connected compo- 
nent of the social graph increased from around 100,000 
to around 150,000, while the size of the largest connected 
component in the placefricnds graph grew from around 
52,000 to around 75,000. The average node degree in 
the social graph (right panel of Fig. QJi) remains almost 
constant over the 8 snapshots, indicating that the so- 
cial network was already well-established and stable at 
the time of the crawl. Conversely, the average degree of 
the placefriends graph increases from around 6.2 in the 
first snapshot to around 6.8 in the last, showing that the 
placefricnds graph becomes denser with time. 

Both the social and the placefriends graphs are small- 
world networks, as confirmed by Fig. |T|b, which reports 
the values of the relative characteristic path length (the 
average distance between any pair of nodes divided by 
the expected value of this quantity in an Erdbs-Rcnyi 
graph with the same number of nodes and links) and the 
mean node clustering coefficient (the mean percentage of 
closed triads incident to a node). The relative charac- 
teristic path length in the placefriends graph is smaller 
than that of the social graph, and the average cluster- 
ing coefficient of the placefriends graphs is consistently 
higher than that of the social graph, indicating that on 
average a node in the placefriends graph is surrounded by 
neighbours who in turn have a high probability of having 
been to a common place. This effect can be explained by 
observing that users of online social networking services 
may add as friends people they meet only occasionally, if 
ever, since creating and maintaining this kind of online 
friendship does not involve any real cost or effort. Con- 
versely, in order to be considered placefriends, two users 
have to have been to the same place, meaning that their 



activities are focused around a certain geographical area. 
This makes it more probable that pairs of their friends 
with which they share a common place could each have 
been to another place in the same area. 



B. Degree distributions and degree correlations 

In Fig. [TJ; we show the degree distribution, i.e. the 
probability P{k) of finding a node having degree equal 
to k, of the social and placefriends graphs correspond- 
ing to the whole dataset. The two degree distributions 
exhibit power-law tails, i.e. P(k) ~ k 1 for large k, indi- 
cating that the form of the distribution is scale-invariant 
and that there is a non-negligible probability of nodes 
having a large number of neighbours. However, the tails 
of the two distributions arc characterised by two differ- 
ent values of 7. Since the exponent of a power- law degree 
distribution is an indirect measure of the heterogeneity 
of the degrees, with larger exponents corresponding to 
more homogeneous distributions, we can conclude that 
the node degrees of the placefriends graph usually are 
more homogeneously distributed. The maximum degree 
of the placefriends graph is much smaller than that of 
the social graph ( k max ~ 1000 in the placefriends graph 
while k max ~ 10,000 in the social graph). These obser- 
vations are consistent with the fact that place-friendship 
is much more costly and demanding than purely online 
friendship. 

For complex networks, the degree distribution alone 
is often not enough to fully characterise the microscopic 
structure. Many networks exhibit degree-degree correla- 
tion, meaning that the existence of a link between two 
nodes having respective degrees k and k' is a function of 
both k and k' [33 . [33[ . Networks can be either assorta- 
tive (nodes of a certain degree are preferentially linked to 
nodes with similar degrees) or disassortative (highly con- 
nected nodes are preferentially linked with other nodes 
having small degree, and vice versa). The assortativity 
of a network can be quantified by looking at the aver- 
age degree k nn (k) of the first neighbours of nodes having 
degree k, as a function of k. For assortative networks, 
k n n(k) is an increasing function of fc, while for disassor- 
tative networks k nn (k) decreases with k. Quite often, 
knnik) is a power-law, i.e. k nn {k) ~ k v ; in these cases, 
the exponent v can be used to quantify the assortativity 
of the network, with more positive values of v indicating 
more assortative networks and more negative values of v 
corresponding to disassortative graphs. 

Fig. [Hi reports the value of k nn (k) for the social and 
the placefriends graphs. Notice that while the social 
graph is markedly disassortative (y ~ —0.28), the place- 
friends graph is assortative (y ~ 0.26). This means that 
hubs in the social graph preferentially link to poorly- 
connected nodes, while nodes in the placefriends graph 
tend to be connected with other nodes having similar de- 
gree. Wc hypothesize that the disassortativity of the so- 
cial graph may be due to the nature of Gowalla as an on- 
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line social service: its most active users, who would prob- 
ably add the most friends, tended to be 'early adopters' 
and people who were particularly interested in new online 
services and technology. These people would have a lot of 
connections and might convince their friends to sign up 
to the service, but these friends could be less interested 
and maybe only add one or two friends before stopping 
using Gowalla. Such patterns of behaviour could give rise 
to the kind of disassortativity we see in the social graph. 

The results reported in Fig. [1] confirm that the struc- 
ture of the social graph constructed from friendship de- 
clared by Gowalla users is fundamentally different from 
the structure of the corresponding placefriends graphs 
obtained from check-in information. This means that so- 
cial ties alone are probably not a good proxy of users' 
activity, and that information about friendships needs 
to be appropriately complemented with other knowledge 
before being used to draw conclusions about users' dy- 
namics. 



IV. SOCIAL AND LOCAL COMMUNITIES 

We have seen that despite being constructed from the 
same dataset, the social and placefriends graphs are quite 
different with respect to heterogeneity and assortativity. 
We now focus on the community structures of the two 
graphs, in order to understand whether these discrepan- 
cies also reflect a different mcso-scale organisation. In 
general, a community is a subset of nodes of a graph 
which arc tightly connected to each other. According to 
the precise definition of community employed (2l| . one 
can require that in order to form a proper community a 
subset of nodes should either be more tightly connected 
than expected in a null model [20l ] or should instead have 
more internal links, i.e. edges between nodes belonging 
to the community, than external ones, i.e. those connect- 
ing a node inside the community with a node outside 
the community (l8| . We employ the former definition, 
and we consider partitions obtained using the Louvain 
method [34| . a greedy agglomerative community detec- 
tion algorithm based on modularity optimisation pol l35j| . 

It has been observed that some OSNs contain groups 
of users who are online friends and also happen to visit 
the same places in the physical world [3(|; in practice, 
these are groups of placefriends who also form a com- 
munity in the social graph. Since we are interested in 
understanding the relationship between online and place- 
focused communities, and in particular in quantifying the 
extent to which the structure of the social graph mir- 
rors the activity of users visiting the same places, we will 
compare the results of community detection performed 
on the Gowalla social graphs and on the corresponding 
placefriends graphs. In the following we call the com- 
munities of the social graph social- only communities, or 
simply social communities, and refer to the communities 
of the placefriends graphs as local communities. 

When tracking the evolution of communities over time, 



we need to be confident that changes in the communities 
in two different temporal snapshots are due to the chang- 
ing structure of the network, not to peculiarities of the 
community detection algorithm. Many greedy commu- 
nity detection methods, including the Louvain method, 
arc non-deterministic and can give different output de- 
pending on the order of the input [2l[ . To address this 
problem, we adopted the algorithm proposed by Kwak ct 
al. [13] ■ The method works as follows. A chosen commu- 
nity detection algorithm able to handle weighted graphs 
(e.g., the Louvain method) is run M times on the same 
network, with the input being given in a different, ran- 
domised order each time, thus obtaining M community 
partitions of the graph. In principle, if the graph has 
a strong community structure then these M partitions 
should differ only for the community placement of a rel- 
atively small subset of nodes. Then, the network is re- 
weighted according to the frequency with which pairs of 
nodes have been placed in the communities of each of 
the M partitions. In particular, the weight of an 
edge connecting nodes i and j is increased or decreased 
proportionally to the number of times that i and j have 
been put in the same community in each of the M runs. 
The re-weighting procedure has the effect of reinforcing 
more robust groupings over those appearing by chance 
or due to a particular input ordering. Then, the process 
is iterated on the re-weighted network, until a consistent 
placement of nodes into communities is obtained and all 
the partitions obtained in the M runs of an iteration are 
identical. Although Kwak et al. noted in their paper that 
the convergence of their method cannot be guaranteed for 
certain graphs, we did not encounter this problem for our 
networks, and we were able to find a stable partition of 
each graph. 



A. Size distribution 

It is common for social networks to exhibit communi- 
ties at different scales, and quite often the distribution 
of community sizes is a power-law. Figure [2] shows the 
size distributions of communities in the final snapshot 
of the social and the placcfriend graphs in Gowalla; the 
distributions do not change significantly between any of 
the snapshots. Notice that the distributions have power- 
law tails, so that more than 95% of both social and lo- 
cal communities have fewer than 30 members. However, 
the exponent of the distribution of social communities 
(7 = 1.71) is smaller than that of the distribution of 
placefriends communities (7 = 2.08), indicating that the 
size of social communities is more heterogeneous. We are 
particularly interested in small communities when con- 
sidering local communities: in a study of communities 
formed by people communicating using mobile phones, 
Onnela et al. |38( found that communities of up to 30 
people tended to be geographically tight, with the span 
of user locations becoming larger more quickly once this 
size is exceeded. Indeed, Figure [5] confirms that the size 



6 



of placcfricnds communities is consistently smaller than 
that of social communities. This might reflect the ease of 
establishing an online tie, while the constraint that users 
within local communities have been to the same places 
means that these communities are necessarily smaller. 
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FIG. 2. (colour online) Distribution of community sizes of the 
social (black circles) and placefriends graphs (red squares) in 
the final snapshot of the dataset. The tails of both distri- 
butions are power-laws, with exponents 7 = —1.71 for social 
communities (solid blue line) and 7 = 2.08 for placefriends 
graphs (dashed orange line). This indicates that the size of 
social communities is more heterogeneous. Notice that place- 
friends communities are consistently smaller than social com- 
munities. 
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B. Shared places 

Figure [3] shows how many intra-community social ties 
in social and local communities are between placefriends 
(recall that placefriends are two users who have been to 
one of the same places, not necessarily having a tie in 
the social graph). Almost all of the social ties between 
members of local communities connect users who are also 
placefriends. In contrast, there are many social commu- 
nities that have no such ties. For instance, we find that 
in more than 30% of the communities in the social graph, 
under 10% of the internal edges lie between nodes that 
are also placefriends, and in more than half the social 
communities, under 50% of the social ties are between 
placefriends. In the local communities, however, we can 
see that in over 80% of the communities, more than 90% 
of the internal social ties are between placefriends. We 
thus see that if we perform community detection on the 
social graph alone we are not able to identify many of 
the local communities. 

Since Gowalla is an explicitly location-focused social 
network with an emphasis on location sharing, one might 
expect users to be friends with those who go to the same 
places. Indeed, this has been shown to be the case by pre- 
vious research into online location-based social networks: 



FIG. 3. (colour online) Proportions of communities having 
particular fractions of intra-community social ties where con- 
nected users are also placefriends (have been to one of the 
same places), in the final snapshot of the social (top panel) 
and placefriends graph (bottom panel). Notice that more 
than 30% of social communities contain less than 10% of con- 
nected nodes which are also placefriends, while more than 80% 
of local communities contain more than 90% of placefriends. 

Scellato et al. [39| found that during the steady growth 
period of the service, 30% of links are added between 
placefriends. However, social communities mainly con- 
tain ties between users who do not visit the same places. 
This is a first piece of evidence that performing commu- 
nity detection on the social graph may not capture local 
communities, even though spatially close users are more 
likely to form ties than distant users (40j . 

V. TEMPORAL EVOLUTION OF 
COMMUNITIES 

We study the formation and deletion of edges in the 
social network in each type of community, by examin- 
ing the social and local communities present in the first 
snapshot, and identifying the social ties that have been 
created and deleted by the time of the final snapshot. We 
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consider ties according to whether they are within or be- 
tween social and local communities. Note that the graph 
we use here is the subset of the social graph G composed 
of nodes that are in communities at the first snapshot. 
Isolated pairs of nodes and nodes with no ties are not 
considered to be in communities. 



A. Null model 

We compare the actual number of links created or 
deleted within or between communities since the first 
snapshot to the corresponding expected number of links 
created or deleted in a null model, where new links 
(resp. old links) are added (resp. removed) at random. 
The only constraint is that we avoid self-loops and mul- 
tiple edges between the same pair of nodes. If x is the 
observable of interest (i.e. number of links meeting the 
given criteria) , we denote by x the expected value of x in 
the null model, and we compute: 

For instance, if we want to assess the significance of edge 
creation between nodes belonging to the same commu- 
nity, the observable x is the total number of edges added 
between nodes in the same community since the first 
snapshot, while x is the expected number of edges cre- 
ated between nodes in the same community when the 
same number of new edges arc placed uniformly at ran- 
dom. That is, the total number of edges added since the 
first snapshot, multiplied by the fraction of those edges 
that could be formed between nodes belonging to the 
same community. This quantity can be thought of as the 
number of 'missing' edges within communities, i.e. the 
number of pairs in the same communities that are not 
connected in the first snapshot. Thus, the expression for 
x reads: 



where K™ is the number of edges missing between mem- 
bers of the same community, K is the total number of 
missing edges (between members of the same community 
and members of different communities), and K + is the 
number of edges added between the first and the final 
snapshots. Instead, if we consider intra-community edge 
deletion, x is equal to the total number of deleted edges 
in the final snapshot multiplied by the fraction of edges 
which lie within a community in the first snapshot. As a 
formula: 



where A"*™ is the number of edges between members of 
the same community in the first snapshot, K m is the to- 
tal number of edges in the first snapshot and K_ is the 



number of edges that have actually been deleted between 
the first and the final snapshot. 

The same model applies when considering the case of 
edges created or deleted between communities, but with 
the obvious replacements of the edge counts for within 
communities with those for between communities. In 
summary, r is the ratio of the number of edges we ac- 
tually see being created or deleted between members of 
the same or different communities, to that we would ex- 
pect if edges were to be added or removed at random. 
We can use this to assess the significance of communities 
for the creation and deletion of edges. 

In Figure [4] we give a visual representation of the ratios 
r corresponding to the formation and deletion of edges 
in the social and in the local graph. In the figure we 
set the areas of the green and red circles proportional 
to the number of edges respectively created or deleted, 
while the areas of the yellow and cyan circles represent 
the expected number of such created of deleted edges 
respectively in the social (yellow) and local graph (cyan) . 



B. Edge formation and deletion 

We first at which social ties form over the course of the 
eight snapshots. Figure [4^ shows the numbers of pairs of 
users who are not friends in the first snapshot and have 
declared a social tie by the final snapshot. The number 
of social ties formed between members of the same social 
community is 25.6 times greater than expected when ties 
form randomly between disconnected users. The effect is 
even stronger for local communities: ties are 70.7 times 
more likely than expected to form between members of 
the same local community. This shows that social com- 
munities could be useful for applications such as friend 
recommendation in services like Gowalla, but local com- 
munities might be even more valuable to consider. 

Then, we investigate the conceptually dual problem: 
which of the pairs of friends who are not placefricnds at 
the start of the snapshots later become placefricnds. Fig- 
ure [4}j shows the number of intra- and inter-community 
pairs of friends who were not placefricnds in the first 
snapshot, who had become placefricnds by the final snap- 
shot. Members of the same social community are only 
1.04 times more likely than expected to become place- 
friends than at random. For local communities, the dif- 
ference is more pronounced: members of the same local 
community are 1.88 times more likely to become place- 
fricnds than at random. A plausible explanation for this 
difference could be that if a community is already focused 
around physical places, members arc more likely to go to 
places where other members have been than when the 
community is only social. This again demonstrates the 
potential benefit of specifically considering local commu- 
nities for applications such as place recommendation. 

Finally, we quantify the deletion of social ties between 
the first and last snapshots. Figure 0J: shows the num- 
ber of social ties that exist in the first snapshot that 
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FIG. 4. (colour online) A visualisation of the ratio between the actual number of edges formed or deleted (green and red circles 
respectively, for social and placefriends graphs) and the corresponding expected number in the null model (yellow and cyan 
circles, respectively for social and placefriends graphs). Panel a): the formation of social ties within local communities is much 
stronger than expected in the null model (70.7 times larger). Panel b): in the placefriends graph there is a high probability 
that two unconnected nodes in the same community become placefriends. Panel c): once formed, placefriends connections are 
very stable and unlikely to be severed with time. 



have been deleted by the final snapshot. Edge deletion 
is a comparatively rare event in online social networks in 
general, and in Gowalla in particular, with under 1% of 
the total edges in the social graph being deleted at all. In 
both social and local communities, edges between mem- 
bers of the same community are less likely to be deleted 
than expected at random, and edges between members 
of different communities are more likely than expected to 
be deleted. Edge deletion in OSNs has not been exten- 
sively studied, due to lack of availability of data , but 
this does seem to suggest that being in the same com- 
munity might indicate that a tie in Gowalla is stronger 
than one between users in different communities, leading 
to its decreased likelihood of deletion. 



C. Community events 

The availability of longitudinal data makes it possible 
to study the stability of social and local communities, i.e. 
to quantify whether the community decomposition of the 
graph observed at the beginning remains stable over time 
or evolves towards a different partition at the end. Other 
research has generally agreed on the main types of event 
that may occur as communities change over time [42l - l45| . 
We take the definitions of Asur et al. [42[ and denoting 
the set of nodes making up community Ck in snapshot i 
by Cj , we define the following possible situations: 

• Continue: Cj, j is a continuation of C\ if and only 
if C 3 i+l = Cj fc , i.e. the set of nodes is the same. 

• k- Merge: C\ and C\ form a merged community 
C^ 1 if C 1 ^ 1 contains at least k% of the nodes be- 
longing to C^UC; 1 , and if it contains more than half 
the nodes in each of C£ and C\. 

• K-Split: C\ has been split in snapshot i + 1 if k% of 
nodes in Cf are present in different communities in 
snapshot i + 1. For Split and Merge we take k = 50 
as in [Hj]. 



• Form: A new community C^ +1 forms in snapshot 
i + 1 if none of the nodes in Cf +1 were grouped in 
a community in snapshot i. 

• Dissolve: A community Cf in snapshot i has dis- 
solved in snapshot i + 1 if none of the nodes in C\ 
are grouped together in snapshot i + 

In order to assess not just whether a community is 
exactly the same as in a Continue event, but whether it 
still exists in some form in the next snapshot although 
users may have joined or left, we defined an event Persist: 

• Persist: C\ persists in snapshot i + 1 ii: 

1. There is a community C^ +1 such that more 
than half of the nodes in C\ are present in 

°i+l 

2. Nodes from C\ make up more than half of the 
nodes in Cf +1 

These latter conditions ensure that the majority of the 
nodes in the community are still the same, and that the 
community has not become merged into a larger one. 
Note that Continue events are a special case of Persist 
events, in which all the nodes belonging to a given com- 
munity in a snapshot are put in the same community in 
the following snapshot. 

D. Dynamics of social and local communities 

We have been considering placefriends relationships 
to continue indefinitely in time: local communities have 
been obtained taking users to be placefriends when they 
have ever checked in to the same places, regardless of 
how long ago that was. We now consider placefriends 
relationships to have different lifetimes and examine how 
this affects local communities. Specifically, we study the 
cases where users are considered to be placefriends only if 
they have checked into one of the same places in a period 
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FIG. 5. (colour online) Percentages of communities undergoing each type of event at each snapshot, with placefriends rela- 
tionships lasting d days for the local communities. Continue and Persist events are the more probable ones both on the social 
graph (black line with circles) and in the co-location graph where placefriend relationships are persistent (orange line with 
crosses). Conversely, when placefriends relationships are restricted to common checkins within 2 weeks, one month and two 
months (resp. red, blue and purple lines), Form and Dissolve are more common. This confirms that placefriend communities 
are more volatile and dynamic than social communities. Split and Merge events were extremely rare in all cases and have been 
omitted. 



of 2 weeks (14 days), 1 month (30 days), or 2 months (60 
days). Recall that even though we only have the struc- 
ture of the social graph during the snapshot periods, we 
have check-ins extending back to the beginning of the 
service, so it is possible to consider a period of check-ins 
before the measurement period began. 

We analysed the occurrence of the community events 
for both social and local communities over the course of 
the snapshots. Figure [5] shows the percentages of commu- 
nities in a snapshot undergoing different types of event, 
for different time thresholds for the expiration of a place- 
friends tie. Note that due to the definitions of the events, 
not all communities will undergo one of the defined events 
at each snapshot; for example, a community may break 
up at the next snapshot, but this will not count as a Dis- 
solve event if any of the nodes are still in the same com- 
munity at the next snapshot. However, it will not count 
as a Split event unless enough nodes of the previous com- 
munity are present together in a community at the next 
snapshot and form more than half of this community; for 
example, the case where a 6-member community breaks 



into 3 pairs, each of which then join different communi- 
ties, will count neither as a Dissolve nor a Split, though 
the community has not Persisted. 

Figure [5] shows that social communities are rather sta- 
ble, with a high proportion, 60% or more, remaining un- 
changed between pairs of snapshots (Persist) and more 
than 69% continuing to exist between each pair of snap- 
shots with minor changes in membership (Continue). 
Dissolve events never affect more than 4.5% of communi- 
ties between a pair of snapshots. Split and Merge events 
are very rare, affecting no more than 0.2% of the com- 
munities in any snapshot, and are therefore not shown in 
the figure. 

The figure shows that when we consider placefriends 
relationships to have a limited duration, local commu- 
nities become highly volatile: when users must have 
checked in to the same place within the past 2 weeks to be 
considered placefriends, under 20% of local communities 
persist from snapshot to snapshot. The proportion re- 
mains under 25% even when the duration is increased to 
2 months, which is quite a generous period of time. This 
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is in stark contrast to the very high proportions of per- 
sistent communities observed when placefriends relation- 
ships are assumed to continue indefinitely; in that case, 
local communities are more stable than social communi- 
ties, with over 80% of communities in any one snapshot 
persisting to the next, and below 2% dissolving. 

The instability of local communities when placefriends 
edges have limited duration may be due to users not con- 
sistently checking in at locations when they go there, 
rather than their ceasing to go to the same places. This 
would reflect how people use Gowalla as a service, rather 
than their true mobility, or indeed their relationships 
with their online friends. Just because users have not 
checked in at the same place for a while, it does not nec- 
essarily mean they no longer see one another, that they 
are no longer friends, or that they no longer visit that 
place, and so we must be careful what we infer from the 
instability of these communities. To investigate what is 
happening in more detail, we examined firstly whether 
or not users in local communities that dissolved between 
snapshots had stopped using the service, and secondly, 
if they had not, whether they were still checking in to 
places in the same area. We found that in most cases, 
users continued to make check-ins in the same geographic 
area as they had been previously. This would indicate 
that the dissolution of the local community probably docs 
not indicate that they are no longer in the same area. 
Furthermore, for all of the 14-, 30- and 60-day lifespans 
of placefriends relationships, between 30% and 35% of 
the communities that dissolved after the first snapshot 
reappeared in one of the later snapshots. This may indi- 
cate that the users are still visiting the places that they 
have in common, but are not regularly checking in using 
Gowalla. Previous research by Lindqvist et al. |46| into 
how people use Foursquare, another location-based social 
network similar to Gowalla, found that people had many 
reasons for not checking in at locations, ranging from pri- 
vacy concerns to the fact that they found it 'boring' to 
keep on checking in repeatedly. The unstable local com- 
munities that we see here may well be a consequence of 
this type of behaviour. 



VI. CONCLUSIONS 



We have analysed social and local communities in 
Gowalla, an online location-based social network, and 
demonstrated that the two community structures do not 
yield the same user groupings. Despite the tendency of 
spatially close users to form social connections, systems 
that aim to make use of the existence of both types of 
community may not be able to rely on simply consider- 
ing the community structure of the social network, but 
should explicitly take geography into account. We have 
seen that local communities could be valuable for friend 
suggestion and place recommendation, since edges are 
more likely to form within local communities than within 
social communities, and friends in the same local com- 
munity are more likely to visit the same places. Finally, 
we have shown that while the social graph changes slowly 
and thus social communities are quite stable, local com- 
munities can be very transient or very stable depend- 
ing on the lifetime given to the placefriends relationship. 
This has implications for systems aiming to make use of 
local communities: the choice of timescale at which the 
placefriends relationship is considered may be crucial due 
to the way in which users perform check-ins. 

These results suggest that location-aware applications 
aiming to exploit the existence of community structure 
in OSNs should not rely only on the detection of social 
communities: these communities can fail to capture local 
groups. By taking geographic information into account, 
local communities can be extracted and these may be 
more useful than social communities in applications such 
as providing personalised friend suggestions or place rec- 
ommendations. However, systems making use of local 
communities should carefully choose the time-scale at 
which they perform community detection, according to 
the particular needs of the application. This work makes 
a step towards online social services and systems being 
able to make better use of community information, as 
they become increasingly location-aware. 
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